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Calculate polarity at axis #1 1 



Polarity =1-3/5 = 04 



Calculate polarity at axis #1 2 



Polarity = 1-0/5 = 10 



Calculate polarity at axis #13 



Polarity = 1 - 3/5 = 0.4 
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^ Calculate polarity at axes #14 -21 
Calculate polarity at axis #22 



Polarity = 1 - 1/5 = 0.8 
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Enter input sequence of desired length 



2. 


Enter window size 




r 


3. 


Axis chosen 3 





4. Calculate and output polarity value 



5 . Move to next axis 



6. Calculate and output polarity value around new axis, then. . . 0 



7. Output ordered list of polarity values 



8. Graph these values 



9. Statistical analysis of observed vs. predicted 



1 0. Identify regions of extended polarity 



a Starting at position = (2*window of symmetry) 
b [1-(S/W)] 

0 Up to and including axis position = [2*length - (2*window size)] 

d Can use a moving average of values (with number of values averaged and increment of 

moving being variable) to smooth curve 
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The algorythm was implemented in PERL programing language 
PERL variable-names and function-names are in boldface. 



2[T 

3 



import input sequence as a string variable $input_seq 



prompt user for ($win_sym ) length 



cut a length of (2*$win_sym ) bases from the 5' end of$input_seq and assign this 
substring into $win_seq 



perform 
(2*win_sym) 
iterations 



chop (cut last character off a string and return it$win_seq 



unshift chopped character (prepend it to the front of an array, moving all elements one step to 
the right) into an initially empty indexed array @trgt_fwd ) 



translate $win_seq so that every G or A or T or C of the original string 
become C or T or A or G respectively of the translated string 



perform . 

(2*win_sym)| 

iterations 



chop the translated $win_seq 



4 



perform 
(win_sym' 
iterations 



6 



7 



push chopped character (stack it onto the end of an array, without moving the other elements) 
onto an initially empty indexed array @trgt_revcomp ) 



assign $match_count = 0 



assign index variable $i = 1 



compare between the $i th elements of arrays@trgt_fwd and @trgt_revcomp 
increase $match_count by 1 , else nothing. 



if equal then 



increase $i by 1| 



calculate $asym_count = 1 - ($match_count / win_sym) 



push $asym_count onto initially empty indexed array@axis_Hst | 



>-[while $input_seq is not empty 



cut one base off of the 5' end of input_seq and assign it into$basefeed 



translate $basefeed GATC->CTAG respectively and assign the 
translated character into $basefeed_comp 



unshift $basefeed_comp (prepend it to the front of an array, moving all elements one 
step to the right) into array @trgt_revcomp • 



pop (remove last element of an array@trgt_revcomp 



assign $match_count = 0 | 



assign index variable$i = 1 



perform 
(win_sym) 
iterations 



compare between the $i th elements of arrays@trgt_fwd and @trgt_revcomp ; if equal then 
increase $match_count by 1 , else nothing. 



increase $i by 1 



calculate $asym_count = 1 - ($match_count / win_sym) 



push $asym_count onto @axis_list 



shift (remove first element of an array and move all elements one step 
leftward) @trgt fwd 



push $basefeed onto array@trgt_fwd 



|assign $match_count = 0 



assign index variable $i = 1 



perform 
(win_sym; 
iterations 



compare between the $i th elements of arrays@trgt_fwd and@trgt_revcomp ; if equal then 
increase $match_count by 1 , else nothing. 



ncrease $i by 1 



calculate $asym_count = 1 - ($match_count / win_sym) 



push $asym_count onto @axis_list 



save @axis_list to file 
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101 Enter input sequence of desired length 



102 Enter window size 



103 Axis chosen' 



104 Count DPT frequencies, calculate and output DPT residuals and chi square value 



105 and 105' Move to next axis 



106 and 106' Count DPT frequencies, calculate and output DPT 
residuals and chi square value around new axis, then. ,. b 






r 




107 Save to file: ordered arrays of DPT frequencies, DPT residuals and chi square values 




r 




108 Further statistical analysis of observed vs. predicted 


V 


109 Graph values c 




r 


1 10 Identify functional elements 



a Starting at axis position = (2*window size) 

b Up to and including axis position = [2*length - (2*window size)] 

c Values include DPT frequencies, statistical measures including residuals and x 
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TheaJgorythm was implemented in PERL programing language. 
PERL variable-names and function-names are in boldface. 

Import inputsequenceasa3tnngvajiable[<inpirt_seq)| 

|prompt u3erfor(»wn_syni) length | 

;utalengthof(2*Swin_sym) basesfromihe5'endof*inpiit_seqandassignthis 



substring into tvin_3eq 



I perfoim [*t 
1 (2*v*i_s>m)| S 

i iterations I 



hop [cut last character off a stnn g and return it) t vin_3eg~ 



unshiftchoppedcharacterfprependittothefront of an array, moving all elements onestepto 
.hetight) into an initially empty indexed array (etrgt_fvd) 



:ranslate$vin_seq30that everyG orAorT orC of the original string 
become C orT or A orGres pectively of the translated stnng 



perform 
| (2'win_3ym)| 
iterations 



P 



hopthetranalaied tvin_3eq 



push chopped charactertstack it onto the end of an array, wthoutmovmgthe other elements) 
onto an initially empty in d exe d array ( @lrq t_re vcom p] 



104i 



(assign ?gg_count = 8, Spa_CBunt =■ 0... one such variablefor each of the 16DPTs 
hssign Index variable <i = 1| 



Jf the ¥1 th elements of arrays @trjt_fv<!anc ^trgt_revc«imp are GandG, respectively, then 
Increas e tsg count by 1 Repeat this conditional operation for each of the 16 possible DPTs 



Increase >i by 1| 



isfciiateresicais ?«o__res 
each of the 16 possibleDPTs 



5 tBj_counl -I(I'IS) x Jwin_sym) Rep eat tte calculation for 



|pu»h$o.«,_res ontog>pa_re3 RepeaUrmteproreathofthe 16 possiMePPTs - 



105 



106 



Mculatechi square. $c hi_sq = ((Jgg.res *«2] I U W 1 6J x twin_symj ■* 
($ ga res •* 2) I {( 1 f 1 6} x $wn_*ym) * 
(* ajt_res ♦* 2) I (( 1 ( 1 6) x * vin_syrn) * 
($ gc_re* "2)1 ((If 16) x $utn_3ym) + 
(tag res **2)f ((If 18) x *wn_*ym) + 
(*aa_res —2) I (( WIS) x *vin_3yra) * 
($al_res "2) I ((1(1 1) x*vin_syi») ♦ 
E$ae res"2|J((1iieix$MnjsiBiil + 
(*lg_res "2J f {( 111 6) x »vln_»ym) ♦ 
($la„res — 2) I ((1116) x *wrn_svm) * 
($tt re3**2)l((in6)xtMn„3ym) + 
(*tc_res —2) I £(1116) x*wn_3?m) * 
($cg_res •♦2) 1 ({If 16) xtvto.syw) •+ 
(*c*_res "2) I ((1'1()x Wm>") ♦ 
f*a_re* **2) f ((HIS) x*wn_3yro) * 
[$cc_res "2} m » 1 6) x «vin_symB 



pB8h$ehi_s«i onto #chijsq 



105' 



1061 



whiletinput_3eqi3not empty) 



;ut one base off of the 5' end of input_seq and assign it into thaseteedl 



IrOTSIaJe Jba3efeedGATC->CTAGrespectivelyanda33ignthe 
:ranslaitedcharacterinti)$ba*efeed_coinp 



unshift Jba3efeed_comp[prependittothefront of an array, moving all elements one 
step tothe tight) into anay gtrgt_revcomp 



pop [remove last element of an array) &trgt_revcomp| 



assign >ao_c<iiirit = 0, Sfla_coBn1 = C- - onesachvarablsforeach of the 16DPTs.| 



asslgnindexvanableti = 1| 



perform I* i 

(wi_sym)| r 

iterations t 



the*i th elements of arrays (Mrgtjfvdar.o @trgt_reveom#sreGand<S,re5peetively,then 
ncreasetj|g_c»Mnt by ) Repeat thi3 conditional operaltionforeaiori ofthe 16 possibleDPTs. 



increased! by 1 



calculate residuals $SB_re9 = * 9 J_c9imt - (( If 1 6) x Jvin_3ym) Repeat 
this cslculationror each of the 16 possibleDPTs 



P<i3h>gg_re3 OTto ggn_r«3 RepeaHhisstepforeach ofthe 16 possibleDPTs! 



Calciilalechi square $cfci_3q 



= tit ga_res **2) I ((Ills) x Jwi_svm) + 
1$ gojres *• 2) f (( 1 M 8) x t win_sym) + 
(t gt_r«» "2) f(( HI 8) x $¥in_sym) + 
($gc_res **2) J ((If 16) x *vm..3yn.) + 
($»g_res "2) 1 ((IMS) xj**i_8»m) + 
fjaajes **2) I ([HI 6) x$**n_3y») + 
(*at_re» **2) I ((UK) x $wtn_sym) + 
(*ae_res **2) f ((HIS) x *\*v sy») + 
l*tjj-es "2)M(H1S) x*y*n_»yi») + 
(*t»_res "2) f (( Hit] x*win_j»ym) * 
(ttt_re» "2) I (( 111 61 x $*in_sym) * 
(*tc_res "2) Md' I8)x*\*i_sy»)* 
(Jcg_re* **2) I ((1IIS) x*vin_sym) + 
(tca_res **2) I (| If 18) x $vn_syni) + 
($et„res -2) < (( If 18) x $vin_3ya) ♦ 
(>cc..res "2) f (( H 1 » x <vin _»y)) 



push >cW_sq onto gcjB_»q. | 



shift (remove first element of anamiyandmoveaH elements oneatep 
eftwar d) ^trgt_f wd 



push >t>asef eed onto array gtrgt_f vd| 



assign taa_ count = a, >ga»_coant - 8 — one such variable for each oUhe 16DPTs. | 



assigninilexvariableti= 1| 



rlf the *i th el em ents of arrays ©1rst_fvd and @trgt_revcomp are G an rj G, respectively, then 
ncreasetBB_e»utit by 1. Repeat this conditional operation for each ofthe 16 possibleDPTs 

iterations 



increase ti by 1| 



;alcula»ereaiduals-$gB_res <• $gg_co«nt -(( If 16) x $H#in_»sanl Repeat this 
calculation for each ofthe IS possibleDPTs 



107 



iu»litn_rts c<-i'o#qB_.re8 Repealthis stepfor each ofthe 16 possibleDPfis 



laiculatechisquare' Jchi_3t| = 



((* 9S_res **2) I (( I1 1 •) x S%4n_sym) ♦ 

IS ga_res " 2 j f (( 1 1 1 6) x $win_avm) + 
(t gt_res "2) f (( If 1 6) x $wi_»ym) + 
(tge_res**2)f t(1fl6)x*win_ayn») + 
(tag_re* **2) t (( 1116) x $win_sym) + 
(*aa_res **2) f (( H 1 6) X <m»_svb) + 
(Uteres "2)f ((Hll)x *v(n_sym) + 
($ac_res **2) I (( H 1 8) x Jwinjsym) + 
($tg_rtt -2) I ((Ifti) x$vin_sym) + 
(*ltt„res ♦♦2) I ((HIS) x tvin_3ym) * 
(SH_res —2) I (f If 1 6) x $win_3ym) + 
l$fc_res "2) I (llf 16) xtwn_sym) + 
(f c 8 _res **2) I (( H 1 6) x tvinr_sym) * 
(»c«i_re» "2) f (( H 1 6) x $vin_sym) * 
($ct_res **2) I (I HI 6) x$win_sym) ♦ 
($cc_res "2) f g H 1 6) x twin„3ym)) 



p«3h $cW__3q onto #chi_a< 



_res to file. Bepesjthis step for each of tha 16 possiMe DPTs. 



save ©ehi_sq tofile. I 
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FIGURE 10 



